AD-755  46S 


ON  THE  HISTORICAL  DEVELOPMENT  OF  THE 
THEORY  OF  FINITE  INHOMOGENEOUS  MARKOV 
CHAINS 


E .  Seneta 


Princeton  University 


Prepared  for: 

Office  of  Naval  Research 

Janu  ary  197  3 


DISTRIBUTED  BY: 


Tnrkltnino!  InfArttVsiiAil  CfiniiAA 

IIQUUUOI  1UUU1I1UUI  imuimuuun  uSimuS 


U.  S.  DEPARTMENT  OF  COMMERCE 

5285  Port  Royal  Road,  Springfield  Va.  22151 


J 


ON  THE  HISTORICAL  DEVELOPMENT  OF  THE  THEORY  OF  FINITE 


INHOMOGENEOUS  MARKOV  CHAINS 
by 

E.  Seneta 


Te  lnical  Report  30,  Series  2 
Department  of  Statistics 
PRINCETON  UNIVERSITY 
January  1973 


Reproduced  by 


NATIONAL  TECHNICAL 
INfORmATiON*  SERVICE 

U  S  Department  of  Commerce 
Springfield  VA  2215! 


statemEntx 


ON  THE  HISTORICAL  DEVELOPMENT  OF  TEE  THEORY  OF  FINITE 


INHOMOGENEOUS  MARKOV  CHAINS 
by 

+ 

E.  Saneta. 


Resume 

The  main  purpose  of  the  note  is  to  compare  necessary  and 
sufficient  conditions  for  weak  ergodicity  of  finite  inhomogeneous 
Markov  chains  given  by  Doebiin  (1937),  and  Hajnai  (1958),  the 
former  paper  being  little  known;  and  more  generally  to  expand  on 
the  nature  and  consequences  of  Doebiin* s  approach  as  compared 
to  Hajnal's  in  some  detail.  A  consequence  is  some  Insight  into 
the  relation  between  various  "coefficients  of  ergodicity** . 
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1.  Introduction.  In  this  note  all  matrices  are  of  fixed  size 

n  x  n  .  Let  {P^.}  »  k  >  1  be  a  sequence  of  stochastic  matrices 

(i.e.  matrices  with  non-negative  entries  and  unit  row  sums);  and 

let  T  ,  =  {tir!^}  be  the  stochastic  matrix  defined  by 
r i a  j 

T  =  P  P  . P 

r  ,k  r+1  r+2  r+k 


for  r  >  0  ,  k  >  1  . 

The  sequence  {P^}  is  said  to  be  weakly  ergoaic  (in  the  sense 
of  Kolmogorov)  if  for  all  i,j,s  =  1,  ...  n  and  r  >  0 


(1.1) 


as  k->-oo 

The  earliest  sufficient  condition  (since  it  is  in  large 
measure  due  to  Markov  himself)  for  weak  ergodicity,  as  presented 
in  the  textbook  of  Bernstein  (1946),  states  that  weak  ergodicity 
obtains  if 


(1.2) 


E  A(P.)  =  » 
i=l  1 


where,  for  a  stochastic  P  =  {p.  •}  , 

1  j  J 


(1.3)  X(P)  =  max  (min  j ? 

In  the  Russian  literature  this  is  known  as  Markov's  theorem;  the 
final  assertion  is  a  consequence  of  the  inequality  for  all 
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i,j  =  l,....,n  ;  r  >  0  and 

k  >  1 

£it'r’k>  - 

3*  XjS  3,s  1 

r+k 

|  <  2  n  (i  -  x(PQ>) 

“  s=r+l  s 

i.  e. 

r+k 

(1.4)  a (7  .  )  <  n  (1  -  X(Pe)) 

r,K  s=r+l  s 

where 

a(P)  =  ^  max 

4  •  • 

1  >3 

l  |pi,s  -  pj,sl 

The  reasoning  leading  to  (1.4)  has  been  substantially  refined 
in  more  recent  times  to  yield. 


r+k 

(1.5)  a(T  ,  )  <  n  a(P  )  , 

r,K  “  s=r+l  s 


(Dobrusin,  1956;  Paz  and  Reichaw,  1967).  This  last  inequality 
sharpens  the  well-known  one  of  Hajnal  (1958); 


r+k 

(1.6)  b(T  ,)  <  H  tl  -  B(P  )} 

r’k  "  s=r+l  s 


where 


(1.7) 


b(P) 


max  max  |p.  -p-  _| 

s  i, j  1,8  D,s 


8(P)  =  min  £  min(p.  «p.  _) 
1,3  s  *  J* 


since  Paz  (1970)  and  Iosifescu  (1972)  show  that 
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(1.8)  0(P)  =  1  -  a(P) 

while  (1.6)  itself  implies 

b(P)  <  1  -  B(P)  ,  =  a(P)  . 

The  first  necessary  and  sufficient  condition  for  weak  ergodi- 
city  is  often  ascribed  to  Hajnal  (1958,  Theorem  3),  and  there  is 
little  doubt  that  he  gave  the  first  proof  involving  such  a  con¬ 
dition,  although  it  is  necessary  to  mention  an  analogous  and 
simultaneous  announcement  of  Sarymsakov  (1958),  given  in  a  broader 
context.  However,  in  a  little-known  summary  paper,  Doeblin  (1937) 
announces  a  condition  of  a  different  kind  which  he  asserts  is  nee-., 
essary  and  sufficient;  and  promises  publication  of  this,  and  other 
material  announced  in  the  paper  in  various  periodicals.  So  far 
as  the  present  author  can  determine,  a  further  paper  containing 
a  proof  of  this  particular  result  never  appeared,  possibly  due  to 
Doeblin* s  premature  death  in  World  War  II.  In  actual  fact,  the 
truth  of  his  assertion  follows  immediately  from  e.g.  that  of 
Hajnal  (1958,  p.  239),  as  we  shall  note  in  the  sequel.  It  is 
nevertheless  interesting  to  speculate  on  the  manner  in  which 
Doeblin  may  have  arrived  at  his  result  in  relation  to  the  know¬ 
ledge  available  at  the  time  and  this  is  the  main  purpose  of  the 
present  note.  Such  investigation  provides  some  insight  into  the 
relation  between  various  "coefficients  of  ergodicity"  which  are 
used  in  4’he  study  of  such  non-homogeneous  situations.  We  confine 
ourselves  to  the  case  of  finite  state-space,  since  it  appears 
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to  the  present  author  from  the  more  recent  papers  cited  above,  that 
there  is  some  but  no  substantial,  loss  in  so  doing,  as  compared 
to  either  the  countable  or  general  state  space  situation,  at  least 
at  the  present  time. 

A  secondary  purpose  of  this  note  is  to  demonstrate  that  the 
development  of  the  theory  of  inhomogeneous  products  of  finite 
stochastic  matrices  as  a  whole,  as  put  forward  by  Hajnal,  can  be 
achieved  perhaps  more  simply  by  basing  one’s  ideas  on  the  approach 
of  Doeblin.  In  particular  we  shall  refer  to  another  characterizati- 
of  weak  ergodicity  (following  the  necessary  and  sufficient  con¬ 
dition  given  above)  in  Doeblin* s  paper,  which  coincides  with 
Hajnal’ s  Theorem  4;  and  compare  the  roles  played  by  "scrambling 
matrices"  and  "Markov  matrices"  in  the  two  theoretical  approaches. 

The  reader  interested  in  the  more  recent  developments  in  the 
subject  should  consult  the  references  cited;  we  mention  that  boe- 
blin's  condition  itself  was  motivated  by  the  announcement  of  a 
sufficient  condition  (which  it  subsumes)  of  Ostenc  (1934). 

2.  Coefficients  of  Ergodicity.  We  shall  denote  by  the  term 

coefficient  of  ergodicity  any  function  y(’)  continuous  on  the 

set  of  (n  x  n)  stochastic  matrices  P  when  P  is  regarded  as 

2 

a  point  in  Euclidean  n  -dimensional  space,  and  satisfying 

0  £  y(P)  <  1  *  A  coefficient  of  ergodicity  shall  be  called  proper 

if 


(2.1) 


y(P)  s  1  if  and  only  if  P  =  1  v* 
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for  some  probability  vector  v  (i.e.  all  rows  of  P  are 
identical) . 

We  shall  be  concerned  with  the  situation  where  1  -  m(*)  is 
a  proper  coefficient  of  ergodicity,  and  p(*)  a  coefficient  of 
ergodicity  (not  necessarily  proper)  such  that 

k 

(2.2)  m(P(1)PC2) _ P(k))  <  C  n  (1  -  p(P(l>)) 

i=l 

Ci) 

for  every  finite  set  of  stochastic  matrices  P^  ,  i  =  l,....,k 
and  every  k  ,  where  C  is  a  constant  which  may  depend  on  p( • ) 
and  m(*)  (but  not  on  the  nature  of  the  finite  set  of  P's  chosen; 
We  see  from  Section  1  that  3(*)  and  1  -  b(#)  are  both  proper 
coefficients  of  ergodicity;  and  (1.4)- (1.6)  are  all  manifestations 
of  (2.2). 

The  following  proposition  is  a  consequence  of  these  definitions 
The  proof  is  totally  analogous  to  the  short  demonstration  of  Hajnal’c 
Theorem  3,  although  Hajnal  deals  with  specific  coefficients,  and  is 
omitted.  (The  ideas  of  the  proof  occur  elsewhere  in  the  present 
note  in  any  case.) 


Theorem  1.  Suppose  that  we  are  given  m  and  y  such  that  (2.2) 
is  satisfied  fror  both  parts  of  this  'theorem).  A  given  sequence 
{Pi }  of  stochastic  matrices  is  weakly  ergodic  if  there  exists  a 
strictly  increasing  subsequence  {i  j )  ,  j  =  1,2,....  of  the 
positive  integers  such  that 


2  P(T- 

5=i  xv 


(2.3) 


00 
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Conversely,  if  (P^)  is  a  weakly  ergodic  sequence,  and  jj(*) 
of  (2.2)  is  also  proper,  then  (2.3)  is  satisfied  for  some 
strictly  increasing  subsequence  {ij}  of  the  positive  integers. 

Corollary.  If  both  the  p  and  1-m  of  (2.2)  are  proper, 
then  (2.3)  is  both  necessary  and  sufficient  for  weak  ergodicity 
of  a  specific  sequence  {P^}  of  stochastic  matrices. 

Thus  a  necessary  and  sufficient  condition  can  be 
formulated  in  terms  of  any  two  specific  proper  coefficients 
of  ergodicity  for  which  (2.2)  can  be  shown  to  hold.  The 
difficulty  occurs  in  demonstrating  this  last',  the  more 
difficult  part  of  e.g.  Hajnal’s  paper  lies  in  demonstrating  that 
(2.2)  holds,  which  as  can  be  seen  from  (1.6)  is  attained  with 

(2.4)  C  =  1  ,  p(P)  =  0(P)  ,  l-m(P)  =  b(P)  . 

The  A(*)  defined  by  (1.3)  is  not  a  proper  coefficient  of 
ergodicity,  and,  while  the  sufficiency  part  of  Theorem  1  gives 
Markov's  theorem,  A  cannot  be  used  directly  in  formulating 
a  necessary  and  sufficient  condition. 

Now,  as  Hajnal  points  out  in  a  slightly  different  context, 
clearly,  for  every  P 

i 

(2.5)  g(P)  >  cc(P)  (>  A(P) ) 

where 


=  E  (min  p.  )  . 
s  i  1,8 


(2.6) 


a(P) 
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It  is  readily  checked  that  a(P)  is  a  proper  coefficient  of 
ergodicity,  and  in  view  of  (2.5)  and  (1.6)  may  be  used  with  the 
m(P)  of  (2.4)  in  a  specific  instance  of  Theorem  1  to  give 

CO 

(2.7)  I  a(T.  .  .  )  =  « 

j=i  1j»15+i“:Lj 

as  a  necessary  and  sufficient  condition  for  weak  ergodicity  of 
a  specific  sequence  {P^}  •  This  is  Doeblin’s  assertion. 

It  is,  however,  possible  to  arrive  at  the  assertion 
that  (2.7)  is  sufficient  for  weak  ergodicity  directly  from 
an  application  of  Markov’s  theorem.  (The  necessity  of  the 
condition  (2.3)  in  Theorem  1,  as  also  for  this  particular  case, 
hinges  only  on  the  fact  that  if  weak  ergodicity  obtains, 
p(Tr  y,)  1  for  each  r  >  0  as  k  os).  It  appears  not 

unlikely  that  this  is  the  manner  in  which  Doeblin  proceeded. 

We  formulate  the  ” comparison"  principle  involved  in  general 
terms  first. 

Lemma  1.  Suppose  that  (2.2)  is  satisfied  for  some  m  and  p 
( y  not  necessarily  proper);  and  let  v ( • )  be  any  coefficient 
of  ergodicity  (not  necessarily  proper).  If  for  any  sequence 
{P'-  j  of  stochastic  matrices  for  which  the  left-hand  side 
diverges 

(2.8)  2  \>(P^^)  -  oo  =>  2  =  co  > 

i=l  i=l 

then  for  a  particular  sequence  { the  existence  of  a 
strictly  increasing  subsequence  of  the  positive  integers  such 
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that 


(2.9) 


E  v(T.  .  .  )  =  co 

3=1  j,3-j+l"  Xj 


Is  sufficient  for  the  weak  ergodicity  of  {P • }  . 

Proof:  Take  r  >  0  fixed  but  arbitrary,  and  consider  k  large 

^r ,k  -^l  such  that  i..  is  the  minimal  number 

of  the  sequence  {i.  }  to  satisfy  i.  >  r  +  1  ;  and  i.(k)  the 
maximal  number  to  satisfy  ij  <  k  +  r»  . 

Then  since 


Tr,k  *  Tr,ivr  % (w  .kT-i,  (fc) 


j=j(k)-l 

Tr’Vr  j=j,  Tij>S+rij  %ck)’k+r“ij(k) 


it  follows  from  (2.2)  that 


j=j (k)-l 


m(T  k)  <  C(X-,(T  i_  .r)){  R  U-wCT.  ,  ,»} 


r  j+i  j 


x  ((l-y(Ti  .  )) 

xj(k),k+r xj(k) 


j=j(k)-l 


A  .  ,  )> 

3=31  ^j+l  Xj 


and  the  right  hand  side  di\erges  to  zero  as  k^05 
(2.8)  and  (2.9)  . 


,  in  view  of 
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Corollary.  The  coefficient  of  ergodicity  oc(  * )  defined  by  (2.6: 
satisfies  condition  (2.8),  with  y  =  \  ,  and  m  =  a  . 

Proof: 

00  /■  00  n 

£  ct(P'  ')  =  E  E  (min  p  ') 

i=l  i=l  s=l  r 

n  “  (i) 

=  E  E  (min  p  }) 

s=l  i=l  r  r’s 

so  that  divergence  of  the  left  hand  side  implies 
°°  (‘) 

E  (min  p^  ')  =  00  for  seme  s  ; 
i=l  r  r’s 

which  in  turn  implies 

CO  CO 

Z  max  (min  p  (i))  =  Z  A(p(i))  =  ~  . 

i=l  s  r  r,s  i=l 

This  corollary  is  merely  a  manifestation  in  part  of  the 
obviously  close  relation  between  A(P)  and  <*(P);  clearly 
A(P)  >  0  if  and  only  if  ct(p)  >  o  (so  a  Markov  matrix  is 
equivalently  defined  by  either  requirement,  as  will  be  seen 
from  its  definition  in  §3.2). 


3.  Comparison  Between  the  Two  Approaches  .  In  this  section 


we  shall  focus  attention  on  a  brief,  direct  comparison  of  the 
proper  coefficients  of  ergodicity  <*(•)  and  &(•)  with  a 
view  to  demonstrating  that,  insofar  as  the  theoretical  matters 
pertaining  to  weak  ergodicity  touched  on  in  Hajnai’s  paper 
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are  concerned,  either  may  be  used  with  equal  convenience. 

3.1.  Coincidence  Probabilities . 

If  we  consider  each  of  two  systems  independently  under¬ 
going  trials  governed  by  an  inhomogeneous  Markov  chain  governed 
by  the  sequence  {?^}  »  then  Doeblin  asserts  that  no  matter 
at  which  state,  (corresponding  to  one  of  the  integers 
1,. . . . ,n)  each  of  the  systems  begins,  they  will  be  in  the  same 
state  at  the  same  time  on  an  infinite  number  of  occasions  with 
probability  1  ,  if  ana  only  if  the  sequence  {P^}  is  weakly 

The  same  prcpos.''"*on  is  stated  and  proved  in  Theorem  4 
cf  Eajral's  paper - 

Tr.e  necessity  of  weak  '.rgodicity  in  this  proof  is  not 
related  to  ccefficiemts  of  ergcdicity;  the  proof  of  sufficiency, 
however  leans  heavily  on  the  inequality 

n  n 

(3-1)  7.  p,  „  p_  o  >  B  (P)/n 

f:r  any  P  .  llov 


>  n 


w  2  11  2 

E  (min  p -  )  ,  >  (  E  (min  p -  )  ) 
s=i  i  'A’s  ~  s=l  i  1,5 


the  Canrhy-Schvarts  ir.ecualiwv  so  that 


pi,s  p2,s  - 


S=1 


ct?(P)/n 


nrf  the  remainder  cf  I-hiral's  proof  of  sufficiency  holds  in 
m(-)  „ 
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3.2.  Scrambling  Matrices  and  Markov  Matrices. 

A  stochastic  matrix  P  is  called  scrambling  if  and  only 
if  g(P)  >  0  where  $(•)  is  defined  in  (2.4).  A  stochastic 
matrix  P  is  called  Markov  if  and  only  if  x(P)  >  0  where 
A(P)  is  defined  in  (1.3).  In  his  Lemma  1,  Hajnal  shows  that 
the  scrambling  property  is  monotone  and  preserved  in  a 
product,  whatever  other  stochastic  matrices  may  follow  a 
scrambling  matrix,  by  showing  that  for  any  stochastic  P  =  (p.  . 


B(P)  <  0(PQ) 


The  same  is  true  of  a( • )  ,  for 


n 


n 


a(PQ)  =  Z  min  {  Z  p.  q,  .  } 
j=l  i  k=l  1,K  K’3 


n  n 


>  Z  Z  (min  p.  .  )q.  . 

j=l  k=l  i  1,K 


n 


=  Z  (min  p.  .  )  =  a(P) 
k=l  i  1,K 


so  That 


a(P)  <  a(PQ)  . 

It  is  also  true,  more  fundamentally,  that  analogously 
to  Hajnal’s  Lemma  2,  if  either  P  or  Q  is  a  Markov  matrix, 
then  so  is  PQ  (a  Markov  stochastic  matrix,  recall,  is  merely 
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one  with  an  entirely  positive  column). 

There  remains  only  one  result,  Hajnal's  Theorem  1,  which 
we  have  not  touched  on  implicitly  or  explicitly,  in  his  devel¬ 
opment  of  weak  ergodicity  theory.  This  theorem  characterizes 
scrambling  matrices  in  terms  of  regular  matrices  (a  regular 
stochastic  matrix  is  one  having  a  single  eigenvalue  of 
modulus  unity,  counting  repeated  eigenvalues  as  distinct), 
ao  it  is  not  possible  to  find  an  analogue  in  this  frame¬ 
work  for  Markov  matrices. 

We  mention,  however,  one  more  result,  important  in 
applications,  where  Markov  matrices  are  just  as  convenient  as 
scrambling  matrices.  Let  G.^  be  the  class  of  (n  x  n)  regular 
stochastic  matrices,  and  let  M  be  the  class  of  (n  x  n) 

Markov  matrices.  Let  t  be  the  number  of  distinct  types  (with 
regard  to  location  of  positive  elements,  but  not  their  actual 
values)  of  matrices  in  G^  .  Finally,  let  {P^}  be  a 
sequence  of  stochastic  matrices. 

Theorem  If  for  each  r  >  0  ,  T„  e  G,  for  all  k  >  1  , 

'  “*  2?  j  K  J-  ————— 

then  T  ,  e  M  for  k  >  t  +  1  . 

-  r,k  -  - 

This  result  is  due  to  Sarymsakov  and  Mustafin  (1957);  although 
the  reader  may  prefer  the  simpler  approach  of  VJolfowitz  (1963, 
Lemmas  3  and  4  where  the  word  "scrambling"  may  be  replaced  by 
"Markov"  without  altering  the  proofs.) 

The  remarks  of  this  section  may  serve  to  indicate  —  and 
the  theme  is  further  expanded  in  the  book  of  the  present 
author  (Seneva,  1973,  Chapter  4)  —  that,  in  spite  of  the  fact 
that  the  notion  of  a  scrambling  stochastic  matrix  may  regarded 
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as  the  more  fundamental,  since  a  matrix  may  be  scrambling 
but  not  Harkov  —  frequently  the  simpler,  and  much  earlier 
notion  of  a  Markov  matrix  will  suffice. 

A  historical  note  on  the  concept  of  "scrambling  matrix" 
itself  (apart  from  the  marginal  reference  to  Dobrusin  already 
cited):  it  appears  to  have  been  exploited  by  Sarymsakov  (1956, 
1958)  as  well  as  Hajnal  (1958). 
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